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f^i Abstract 

Consider semiparametric models that display local asymptotic exponentiality (Ibragi- 
mov and Has'minskii (1981) [H]), an asymptotic property of the likelihood associated with 
£^ 1 discontinuities of densities. Our interest goes to estimation of the location of such discon- 

C*~) 1 tinuities while other aspects of the density form a nuisance parameter. It is shown that 

under conditions on model and prior comparable to those of Bickel and Kleijn (2012) [TJ, 
the posterior distribution displays Bernstein-von Mises-type asymptotic behaviour, with 
exponential distributions as the limiting sequence. Results are applied to semiparametric 
LAE location and scaling examples. 



1 Introduction 

In recent years, asymptotic efficiency of Bayesian semiparametric methods has enjoyed much 



> 

attention. The general question concerns a non-parametric model & in which exclusive in- 

(N ■ 

' terest goes to the estimation of a sufficiently smooth, finite-dimensional functional of interest. 

Asymptotically, regularity of the estimator combined with the Cramer-Rao bound in the 
\ Gaussian location model that forms the limit experiment [27] fixes the rate of convergence 

to n -1 / 2 and poses a bound to the accuracy of regular estimators expressed, e.g. through 
Hajek's convolution [13] and asymptotic minimax theorems |14| . In Bayesian context, ef- 
ficiency of estimation is best captured by a so-called Bernstein-von Mises limit (see, e.g. 
Le Cam and Yang (1990) [30]). Just like frequentist parametric theory for regular estimates 
extends quite effortlessly to regular semi-parametric problems, semi-parametric extensions 
of Bernstein-von Mises-type asymptotic behaviour of posteriors proceeds without essential 
problems. Although far from developed fully, some general considerations of Bayesian semi- 
parametric efficiency are found in [B |4"1 181 1331 135] (model- and/or prior-specific derivations of 
the Bernstein-von Mises limit are many, e.g. [3J [5j El [161 E3 1231 [Ml [25] (of which most are 
formulated in (the conjugacy class of) Gaussian white-noise with Gaussian priors)). Limits 
of posteriors on sieves are considered in Ghosal (1999, 2000) [TOl [11] and Bontemps (2011) 
0. Kim and Lee (2004) [18], Kim (2006, 2009) [El [20] and, more recently, Castillo and 
Nickl (2012) [7] even consider infinite-dimensional limiting posteriors (notwithstanding the 
objections raised in Freedman (1999) [9]). 
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However, not all estimators are regular. The quintessential example calls for estimation of 
a point of discontinuity of a density: to be a bit more specific, consider an almost-everywhere 
differentiable Lebesgue density on R that displays a jump at some point 9 £ R; estimators for 
9 exist that converge at rate n _1 with exponential limit distributions |15j . To illustrate the 
form that this conclusion takes in Bayesian context, consider the following theorem. 

Theorem 1.1. For 9 € R, let F e (x) = (1 - e"^" 61 )) V 0. Assume that X X ,X 2 , . . . form an 
i.i.d. sample from Fg , for some 9q. Let tt : R — > (0, oo) be a continuous Lebesgue probability 
density. Then the associated posterior distribution satisfies, 



sup 

A 



U n (9£A\X 1 ,...,X n )-Exp^ (A) 



where 9 n = Xn\ is the maximum likelihood estimate for 9q. 

The proof of this Bernstein-von Mises limit is elementary and does not depend in any 
crucial way on the particular parametric family of distributions that we chose. 

As a frequentist semi-parametric problem, estimation of a support boundary point is a 
well-understood problem (see Ibragimov and Has'minskii (1981) [E]): assuming that the 
distribution Pg of X is supported on the half-line [9, oo) and an i.i.d. sample X±, X 2 , ■ ■ ■ , X n 
is given, we follow [15] and estimate 9 with the first order statistic Xm = minj{JQ}. If Pg 
has an absolutely continuous Lebesgue density of the form pg(x) = r](x — 9) l{x > 9}, its rate 
of convergence is determined by the behaviour of the quantity ei-> J rj(x) dx for small values 
of e. If r]{x) = x a (l + o(l)) as x I 0, for some a E (—1, 1), then, 

n V(i+a)( X{1) _0) =Op()(1) . (1) 

For densities of this form, for ciny sequence n that converges to 9 at rate n~ l /^ l+a \ Hellinger 
distances obey (see Theorem VI. 1.1 in |15|): 

n 1 / 2 H(P dn ,Pg) = 0(l). (2) 

If we substitute the estimators 9 n = 9 n (X\, . . . ,X n ) = X^, uniform tightness of the sequence 
in the above display signifies rate optimality of the estimator (c./. Le Cam (1973, 1986) 
[28, 29|). Regarding asymptotic efficiency beyond rate-optimality, e.g. in the sense of minimal 
asymptotic variance (or other measures of dispersion of the limit distribution), one notices 
that the (one-sided) limit distributions one obtains for X^ can always be improved upon by 
de-biasing (see Section VI. 6, examples 1-3 in [15] and Le Cam (1990) [5T]). 

As a semi-parametric Bayesian question, the matter of estimating support boundaries is 
not settled by the above: for the posterior, it is the local limiting behaviour of the likelihood 
around the point of convergence (see, e.g., Theorems VI.2.1-VI.2.3 in |15| ) that determines 
convergence rather than the behaviour of any particular statistic. The goal of this paper is 
to shed some light on the behaviour of marginal posteriors for the parameter of interest in 
semi-parametric, irregular estimation problems, through a study of the Bernstein-von Mises 
phenomenon. Only the prototypical case of a density of bounded variation, supported on 
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the half-line [0,oo) or on the interval [0,9], with a jump at 9, is analysed in detail. We 
offer a slight abstraction from the prototypical case, by considering the class of models that 
exhibit a weakly converging expansion of the likelihood called local asymptotic exponentiality 
(LAE) |15] . to be compared with local asymptotic normality [26] in regular problems. Like 
in the parametric case of Theorem II. 1\ this type of asymptotic behaviour of the likelihood 
is expected to give rise to a (negative-)exponential marginal posterior satisfying the irregular 
Bernstein-von Mises limit: 



sup 

A 



U n (heA | Xx,...,X n )- ExpX n)7( , 0ji)o (A) Ao, (3) 



where h = n{9 — 9q) and the random sequence A n converges weakly to exponentiality (see 
Definition 12.1 1) . Like in the regular case, the limit ([2D allows for the asymptotic identification 
of credible sets with confidence intervals associated with the maximum likelihood estimator. 
The constant 7e ,r? determines the scale in the limiting exponential distribution and, as such, 
the width of credible sets. In this paper, we explore general sufficient conditions on model 
and prior to conclude that the limit ([3]) obtains. 

The main theorem is applied in two semi-parametric LAE example models, one for a 
shift parameter and one for a scale parameter (compare with the two regular semiparametric 
questions in Stein (1956) |36j). The former one is an extension of the setting considered in 
Theorem 11.11 and the later includes a problem of estimation of the scale parameter in the 
family of uniform distributions [0, A], (A > 0). 

The paper is structured as follows: in Section [2] we give the main theorem and a corollary 
that simplifies the formulation. In Section [21 the proof of the main theorem is built up in 
several steps, from consistency under perturbation [1], to an LAE expansion for integrated 
likelihoods and on to posterior exponentiality of the type described by ([3]). Section 2] discusses 
two semiparametric LAE models to demonstrate that they satisfy the exponential Bernstein- 
von Mises property ([3]) asymptotically. 

Notation and conventions 

The (frequentist) true distribution of each of the data points in the i.i.d. sample X_ n = 
{X\, . . . , X n ) is denoted Po and assumed to lie in the model & . Associated order statistics are 
denoted X^,X^, .... The location-scale family associated with the exponential distribution 
is denoted Exp^ A and its negative version by Exp^ , . We localise 9 by introducing h = 
n(9 — 9o) with inverse 9 n {hj) — 9q-\-ti h. The expectation of a random variable f with respect 
to a probability measure P is denoted Pf; the sample average of g(X) is denoted F n g(X) = 
(V™) E2=i 9{Xi) and G n g(X) = n 1 / 2 (¥ n g(X)-Pg(X)). lfh n is stochastic, P£ n[hn) J denotes 
the integral J f(uj) (dPg n ^ hn ^ v /dPQ(uj))(uj) oIPq(uj). The Hellinger distance between P and 
P' is denoted H (P, P') and induces a metric dn on the space of nuisance parameters H by 
dniViV') = H{Pg ^,Pg ^i), for all i], i]' G H. A prior on (a subset 6 of) M fe is said to be 
thick (at 9 € &) if it is Lebesgue absolutely continuous with a density that is continuous and 
strictly positive (at 9). 
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2 Main results 



Throughout this paper we consider estimation of a functional 8 : & — > R on a nonparametric 
model & based on a sample X\, X%, . . ., distributed i.i.d. according to some unknown Pq G 
We assume that ^ is parametrized in terms of a one-dimensional parameter of interest 6 G 
and a nuisance parameter n G iJ so that we can write ^ = {Pe, v ■ G 0,7/ G P}, an d that 
=5^ is dominated by a cr-finite measure on the sample space with densities pe, v - The set is 
open in R, and (H, dn) is an infinite dimensional metric space (to be specified further at later 
stages). Assuming identifiability, there exist unique (6q,t]q) G x H such that Pq = Pe ,ri - 
Assuming measurability of the map (9, rj) i— > Pg ^ and priors lie ° n ® an d II# on , the prior 
IT on ^ is defined as the product prior lie x n# on @ x i7, lifted to <^ 2> . The subsequent 
sequence of posteriors [12] takes the form, 



/n I „ n 

H P (Xi)du(p) / / n P (Xi)dn(P), 
-^=1 / , ' &> i=l 



(4) 



where A is any measurable model subset. 

Throughout most of this paper, the parameter of interest 6 is represented in localised 
form, by centering on #o an d rescaling: h = n(6 — 9q) G R. (We also make use of the inverse 
&n(ti) = 6q + n^h.) The following (irregular) local expansion of the likelihood is due to 
Ibragimov and Has'minskii (1981) |15j . 

Definition 2.1 (Local asymptotic exponentiality). A one- dimensional parametric model 6 i— >• 
Pq is said to be locally asymptotically exponential (LAE) at 0q G if there exists a sequence 
of random variables (A n ) and a positive constant 7# such that for all (h n ), h n — > h, 

n 

II P6o+n ~ lhn (X,) = eMhje + o Peo (1)) l {fe<An} , 

f=l Pe « 

wii/i A n converging weakly to ExpQ" 7e . 

In many examples, e.g. that of Subsection l4.ll A n and its weak limit are independent of 9q. 
This definition should be viewed as an irregular variation on the one-dimensional version of 
Le Cam's local asymptotic normality (LAN) [26J, which forms the smoothness requirement in 
the context of the semiparametric Bernstein-von Mises theorem (see Bickel and Kleijn (2012) 
PQ). Like the LAN expansion gives rise to asymptotic normality of the marginal posterior for 
the parameter of interest, an LAE expansion is expected to give rise to a one-sided, exponential 
marginal posterior limit, c.f. ([3]). 

In order to establish the limit ([3]), we study posterior convergence of a particular type, 
termed consistency under perturbation in pQ. One can compare this type of consistency with 
ordinary posterior consistency in non-parametric models, except here the non-parametric 
component is the nuisance parameter n and we allow for (stochastic) perturbation by (local) 
deformations of the parameter of interest 9 n (h n ) = 9q + n~ l h n . In regular situations, this 
gives rise to accumulation of posterior mass around so-called least-favourable submodels [T], 
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but here the parameter of interest is irregular and the situation is less involved: accumu- 
lation of posterior mass occurs around (9 n (h n ), rjo). Therefore, posterior consistency under 
perturbation describes concentration in ^//-neighbourhoods of the form, (p > 0), 

D(p) = {r 1 eH:d H (r 1 ,r 10 ) < p}. (5) 

To guarantee sufficiency of prior mass around the point of convergence, we use Kullback- 
Leibler-type neighbourhoods of the form, 

Pe n {h)^ 

(6) 



K n (p, M) = {r ] GH:P ( sup -1^ log < p 2 , 

\h\<M [ Pe ,vo J 



r> I i l P0 n {h),ri \ 2 

P ( sup -1 A log— -^-1 <p 

\h\<M Pe ,vo J 



where, in the present LAE setting, 



A dnih) = L: P J^L( X )>0 

Note that Y\™=i -*-^e n (h)(^*) = ^-{h<A n }i as i n the LAE expansion. 

Suppose that A in (jU) is of the form A = B x H for some measurable B C 0. Since we 
use a product prior lie x II//, the marginal posterior of the parameter 9 G depends on the 
nuisance factor only through the integrated likelihood, 

„ n 

Sn : R : 9^ / TT da H (v), (7) 

and its localised version, h h-> s n (/t) = S' n (#o + " _1 /i)- One of the conditions of the subsequent 
theorem is a domination condition based on the quantities, 



(n \ 



Another condition required in the irregular version of the semiparametric Bernstein-von Mises 
theorem is one-sided contiguity (c./. condition (iv) of Theorem I2.2I below) . Lemma I3.2I shows 
that such one-sided contiguity and domination as in (jSJ) are closely related and provides two 
different sufficient conditions for both to hold in general. The log-Lipschitz construction is used 
in the examples of Section [H in other applications of the theorem it may be more convenient 
to by-pass Lemma I3.2I and prove (|8j) and contiguity directly from the model definition. 

Theorem 2.2 (LAE - Semiparametric Bernstein-von Mises). Let Xi, X2, ■ ■ ■ be distributed 
i.i.d.-Po, with Pq S 8? . Let II// and Lie be priors on H and G and assume that lie is 
thick at 9q. Suppose that 9 i— > Pg„ is stochastically LAE in the 9 -direction, for all r/ in a 
dji -neighbourhood of 770 and that 76» 0)r?0 > 0- Assume also that for large enough n, the map 
h i — y s n {h) is continuous on (— 00, A n ], Pq- almost- surely. Furthermore, assume that there 
exists a sequence (p n ) with p n \, 0, np\ — > 00 such that, 

(i) for all M > 0, there exists a K > such that for large enough n, 

Il H {K n (p n ,M))>e- Kn ti, 
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(ii) for all n large enough, the Hellinger metric entropy satisfies, 



N(p n ,H,d H ) <e n & 



and, for every bounded, stochastic [h n ) 



(Hi) the model satisfies the domination condition, 



U n ( Pn ,h n ) = 0(1) 



(8) 



(iv) for every n £ D(p) for p > small enough, the sequence Pi 
respect to the sequence Pq , 



is contiguous with 



(v) and for all L > 0, Hellinger distances satisfy the uniform bound, 



SUp yz r 



Finally, suppose that, 



(vi) for every (M n ), M n — > oo, the posterior satisfies 



n n (|/i| <M n \x u ...,x n )^i. 



Then the sequence of marginal posteriors for 6 converges in total variation to a negative 
exponential distribution, 



Regarding the nuisance rate of convergence p n , conditions (i) and (ii) are expected in 
some form or other in order to achieve consistency under perturbation. As stated, they almost 
coincide with requirements for non-parametric convergence at rate (p n ) without a parameter 
of interest [12]. A simplified version of Theorem [2T2J that does not refer to any specific nuisance 
p n is stated as Corollary 12.11 In the rate-free case of Corollary 12.11 conditions on prior mass 
and entropy numbers {(i) and (ii)) essentially require nuisance consistency (at some rate 
rather than a specific one), thus weakening requirements on model and prior. Concerning 
conditions (iii)-(v), note that, typically, the numerator in condition (v) converges to zero at 
rate 0(n -1 / 2 ), c.f (J2J), while the denominator goes to zero at slower, non-parametric rate. As 
such, condition (v) is to be viewed as a weak condition that rarely poses a true restriction on 
the applicability of the theorem. Furthermore, Lemma 13.21 formulates two slightly stronger 
conditions to validate both (Hi) and (iv) above for any rate (p n )- 

Condition (vi) of Theorem 12.21 appears to be the hardest to verify in applications. On 
the other hand it cannot be weakened since (vi) also follows from Q. Besides condition (i), 
only condition (vi) implies a requirement on the nuisance prior Tin- Experience with the 
LAN version [1] suggests that conditions (i)-(v) are relatively weak in applications, while (vi) 
harbours the potential for negative surprises, mainly due to semiparametric bias leading to 




(9) 
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sub-optimal asymptotic variance, sub-optimal marginal rate or even marginal inconsistency. 
On the other hand, there are conditions under which condition (vi) is easily seen to be valid: 
in Section 13.31 we present a model condition that guarantees marginal posterior convergence 
according to (vi) for any choice of the nuisance prior lin- 
ks discussed already after Theorem 12.21 in many situations the domination condition 
holds for any rate (p n )- This circumstance simplifies the result substantially, leading to the 
conditions that are comparable to those of Schwartz' consistency theorem (see Schwartz (1965) 



Corollary 2.1 (Rate-free Semiparametric Bernstein-von Mises). Let X\,X2, . . . be distributed 
i.i.d.-Po, with Pq £ & and let He be thick at 9q. Suppose that 9 h-> Pg tV is stochastically LAE 
in the 6-direction, for all n in a du -neighbourhood of r]Q and that 7e ,r;o *s strictly positive. 
Also assume that for large enough n, the map h i— > s n (h) is continuous on (— oo,A n ] Pq- 
almost- surely. Furthermore, assume that, 

(i) for all p > 0, the Hellinger metric entropy satisfies N(p,H,djj) < oo, and the nuisance 
prior satisfies TLh(K(p)) > 0, 

(ii) for every M > 0, there exists an L > such that for all p > and large enough n 
K(p)cK n (Lp,M), 

and that for every bounded, stochastic (h n ), 

(Hi) there exists an r > such that U n (r,h n ) = 0(1), 

(iv) for every n £ D(r) the sequence Pg^^ v is contiguous to the sequence Pq q r)! 
(v) and that Hellinger distances satisfy, sup v£H H(P en ( hn ^ v , Pg (hr) ) = 0(n -1 / 2 ). 
Finally, assume that, 
(vi) for every (M n ), M n — > oo, the posterior satisfies, 

U n (\h\ <MJX 1 ,...,X n ) Al. 



Then marginal posteriors for 9 converge in total variation to a negative exponential distribu- 
tion, 



sup 

A 



U n (hGA\X 1 ,...,X n )-Exp- Anneovo (A) 



Ao. 



proof Under conditions (i), (ii), (v), and the stochastic LAE assumption, the assertion of 
Corollary 13.11 holds. Due to conditions (Hi) (and (iv)), conditions (in) (respectively (iv)) in 
Theorem 12.21 are satisfied for large enough n. Condition (vi) then suffices for the assertion of 
Theorem [331 □ 
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3 Asymptotic posterior exponentiality 



In this section we give the proof of Theorem 12. 2 1 in several steps: the first step (Subsection 13. ip 
is a proof of consistency under perturbation under a condition on the nuisance prior Uh 
and a testing condition. In Subsection 13.21 we show that the integral of the likelihood with 
respect to the nuisance prior displays an LAE-expansion, if consistency under perturbation 
obtains and contiguity/domination conditions are satisfied. In the third step, also discussed 
in Subsection I3.2| we show that an LAE-expansion of the integrated likelihood gives rise to a 
semiparametric exponential limit for the posterior in total variation, if the marginal posterior 
for the parameter of interest converges at n -1 -rate. The rate of marginal convergence depends 
on the existence of a suitable test sequence, which is discussed in Subsection [331 Put together, 
the results constitute a proof of Theorem 12.21 Stated conditions are verified in two examples 
in Section^ 

3.1 Posterior convergence under perturbation 

Given a rate sequence (p n ), Pn 4 0, we say that the conditioned nuisance posterior is consistent 
under n _1 -perturbation at rate p n , if, for all bounded, stochastic sequences (h n ), 

n n (D c ( Pn ) \ e = e + n - 1 h n ,x 1 ,...,x n )^o, 

For a more elaborate discussion of this property, the reader is referred to Bickel and Kleijn 
(2012) p. 

Theorem 3.1 (Posterior convergence under perturbation). Assume there is a sequence (p n ), 
p n \. 0, np n — > oo with the property that for all M > there exist a K > such that, 

U H (K n ( Pn ,M)) > e' Kn P", N{p n ,H,d H ) < e n ^, 

for large enough n. Assume also that for all L > and all bounded, stochastic (h n ), 

SU P — WTp p1 — = o( )" ( 0) 

Then, for every bounded, stochastic (h n ) there exists an L > such that, 

u n (D c (L Pn ) | e = e + n - l h n ,x 1 ,...,x n ) =o Po (l). 

The proof of this theorem can be broken down into two separate steps, with the following 
testing condition in between: for every bounded, stochastic (h n ) and all L > large enough, 
a test sequence (<f> n ) and constant C > must exist, such that, 

P^ n ^0, sup PI ■> (l-<p n )< e - CL2n rt, (11) 

v eDc{L Pn ) 

for large enough n. According to Lemma 3.2 in pQ, the metric entropy condition and "cone 
condition" (|10p suffice for the existence of such a test sequence. Here, we concatenate and refer 
to [lj for a full discussion. While the above testing argument is instrumental in the control of 
the numerator of (|3D, the denominator of the posterior is lower-bounded with the help of the 
following lemma, which adapts Lemma 8.1 in [12] to n _1 -perturbed, irregular setting. 
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Lemma 3.1. Let (h n ) be stochastic and bounded by some M > 0. Then 
P °({ l H U.^ff^( X ^ dU H^ < e- {l+C)np2 Tl H (K n (p,M))}n{h n < A„}) < 

for all C > 0, p > and n > 1, where 9 n (h n ) = 6q + n~ x h n . 

In many applications, (p n ) does not play an explicit role because consistency at some rate is 
sufficient. The following provides a possible formulation of weakened conditions guaranteeing 
consistency under perturbation. Corollary 13.11 is based on the family of Kullback-Leibler 
neighbourhoods that would also play a role for marginal posterior consistency of the nuisance 
with known 9q (as in |12j): 

K(p) = \ V eH: -P l O g^<^P (log^) 2 <p 2 }, 

for p > 0. 

Corollary 3.1. Assume that for all p > 0, N(p, H,dn) < oo and Uh(K(p)) > 0. Further- 
more, assume that for every stochastic, bounded (h n ), 

(i) for every M > 0, there exists an L > such that for all p > and large enough n, 
K(p)cK n (Lp,M). 

(ii) Hellinger distances satisfy sup veH H(Pg n ^ hn ^ rj , Pg 0>ri ) = 0(n _1 / 2 ). 

Then there exists a sequence (p n ), Pn i 0, np\ — > oo, such that the conditional nuisance 
posterior converges under n" 1 -perturbation at rate (p n )- 

proof See the proof of Corollary 3.3 in Bickel and Kleijn (2012) [T]. □ 
3.2 Marginal posterior asymptotic exponentiality 

To see how the irregular Bernstein-von Mises assertion ([3]) arises, we note the following: the 
marginal posterior density 7r n : — > M for the parameter of interest with respect to the prior 
He is given by, 

M0)= [ fl^(x t )du H ( v ) / [ [ fl^-ixjdnsWdneio), 

Po'-almost-surely. This form resembles that of a parametric posterior density on O if one 
replaces the ordinary, parametric likelihood by the integral of the semiparametric likelihood 
with respect to the nuisance prior, c.f. S n (9) in ([7]). If S n (0) displays properties similar to those 
that lead to posterior asymptotic normality in the smooth parametric case, we may hope that 
in the irregular, semiparametric setting the classical proof can be largely maintained. More 
specifically, we shall replace the LAN expansion of the parametric likelihood by a stochastic 
LAE expansion of the likelihood integrated over the nuisance as in ([7]) . Theorem 13.31 uses this 
observation to reduce the proof of the main theorem of this paper to a strictly parametric 
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discussion, much in the way the proof of asymptotic posterior normality in pQ mimics the 
parametric proof of Le Cam and Yang (1990) [30] . 

In this subsection, we prove marginal posterior asymptotic exponentiality in two parts: 
first we show that S n (9) satisfies an LAE expansion of its own, and second, we use this to 
obtain Bernstein-von Mises assertion ([3]), proceeding along the lines of proofs presented in 
Le Cam and Yang (1990) [30], Kleijn and van der Vaart (2012) [22J and Kleijn (2003) [21] . 
We restrict attention to the case in which the model itself is stochastically LAE and the 
posterior is consistent under n~ ^perturbation (although other, less stringent formulations 
are conceivable). 

Theorem 3.2 (Integrated Local Asymptotic Exponentiality). Suppose that the model is 
stochastically locally asymptotically exponential in the 9-direction at all points (6o,r]), (jy G H) 
and that conditions (Hi) and (iv) of Theorem \2.2\ are satisfied. Furthermore, assume that model 
and prior Tin are such that for some rate (p n ) and every bounded, stochastic (fan)? 

U n (D c ( Pn ) \ e = 6 + n- 1 h n] X 1 ,...,X n )-%0. 

Then the integral LAE-expansion holds, i.e., 

„ n „ n 

/ U^^(X i )dU H ( V )= / n^(^)dnH(r ? )exp(/ ln7e0ir , 0+OPo (l))l {hn<An} , 

for any stochastic sequence (h n ) C R that is bounded in Po-probability. 

The following theorem uses the above integrated LAE expansion in conjunction with 
a marginal posterior convergence condition to derive the exponential Bernstein-von Mises 
assertion. Marginal posterior convergence forms the subject of the next subsection. 

Theorem 3.3 (Posterior asymptotic exponentiality). Let be open in R with thick prior 
lie- Suppose that for every n > 1, h \-t s n (h) is continuous on (— oo, A n ], Po-almost-surely. 
Assume that for every stochastic sequence (h n ) C R that is bounded in probability, 

7~^T = ex P( h nle ,vo + °Po( 1 )) 1 {/ l „<A n }, (12) 

for some positive constant 7e 0l??0 - Suppose that for every M n — > oo, we have, 

U n (\h\ <M n | Xi,...,X n ) Al. (13) 

Then the sequence of marginal posteriors for 6 is asymptotically exponential in Po-probability, 
converging in total variation to a negative exponential distribution, 

U n {hGA | X ll ...,X n )-E W - An ^ ovo (A) AO. (14) 

Conditions (Hi) and (iv) of Theorem 12.21 are crucial in the derivation of the two theorems 
presented above. In the following lemma we present two sufficient conditions for both the 
domination and the one-sided contiguity condition to hold. The first method poses the dom- 
ination condition in slightly stronger form (see "q-domination" below); the second relies on a 
log-Lipschitz condition for model densities and uniform finiteness of exponential moments of 
the Lipschitz constant. 



sup 

A 
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Lemma 3.2. Suppose that the model satisfies at least one of the following two conditions: 

(i) ("q- domination" condition) 

for every bounded, stochastic (h n ), small enough p > 0, and some q > 1, 

q 



8 £P P °oJU 

ieD{p) \ i=1 



P*«(hn).»J^j =0(1), (15) 



(ii) (log-Lipschitz condition) 

or, for all n £ H there exists a measurable mg 0tV > such that for every x £ Aq q and 
for every in a neighbourhood of 8q, 



Pe. 



Pe ,ri 

and for small enough p > and all K > 0, sup veD ^ Pe Q ,r)& Kme ° ,r] < oo. 
Then, for fixed p > small enough, 
(i) the model satisfies the domination condition 

n 

Pe n (h n ), v 



SU P PI J II 



(Xi) =0(1), 



veD(p) \ i=1 Pe ,v 
(ii) and, for every n € D(p), the (Pg , h ■> ) is contiguous with respect to the {Pq q T) ). 

The log-Lipschitz version of this lemma is used in both examples of Section [J] to satisfy 
conditions (Hi) and (iv) of Theorem 12.21 



3.3 Marginal posterior convergence at n 1 -rate 

One of the conditions in the main theorem is marginal consistency at rate n , so that the 
posterior measure of a sequence of model subsets of the form 

6 n x H = {(6, rj) G 6 x H : n\9 - 9 \ < M n }, 

converge to one in Po-P r °bability, for every sequence (M n ) such that M n — > oo. As mentioned 
in pQ, (semiparametric) marginal posteriors have not been studied extensively or systemati- 
cally in the literature. As a result fundamental questions (e.g. semiparametric bias) concern- 
ing marginal posterior consistency have not yet received the attention they deserve. Here, we 
present a straightforward formulation of sufficient conditions, based solely on bounded likeli- 
hood ratios. This has the advantage of leaving the nuisance prior completely unrestricted but 
may prove to be too stringent a condition on the model in some applications. Conceivably 
[B], the nuisance prior has a much more significant role to play in questions on marginal con- 
sistency. The inadequacy of Lemma 13.31 manifests itself primarily through the occurrence of 
a supremum over the nuisance space H in condition (|17p . a uniformity that is too coarse. It 
can be refined somewhat by requiring uniform bound on the likelihood ratios on a sequence 
of model subsets, capturing the most of the full nonparametric posterior mass. Reservations 
aside, it appears from the examples of Section H] that the lemma is also useful in the form 
stated. 
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Lemma 3.3. Let the sequence of maps 6 i-> S n (9) be Po-almost surely continuous on (—00, A n ] 
and exhibit the stochastic integral LAE property. Furthermore, assume that there exists a 
constant C > such that for any (M n ), M n — > 00, M n < n for n > 1, and M n = o(n), 

P n ( sup sup P n log EHOL < ^ 1. (17) 

Then, for any nuisance prior Hh and Hq that is thick at Oq, 



n n (n\e-e \ > M n I Zi,...,z n )Ao, 



for any (M n ), M n -)• 00. 



proof Let us first note, that if marginal consistency holds for a sequence M n , then it also 
holds for any sequence M' n that diverges faster (i.e. if M n = 0(M' n )). Without loss of 
generality, we therefore assume that M n diverges more slowly than n, i.e. M n = o(n). We 
can also assume M n < n for n > 1. Define F n to be the events in (fT7|) so that Pq(F^) = o(l) 
by assumption. In addition, let 

G n = |(Xx, . . . ,X n ) : J^S n {9) dU e (6) > e- CM "' 2 S n (d^. 

By Lemma |3~^1 P n (G^ l ) = o(l) as well. Hence, 

p^u n (n\e-e \ >M n I x 1 ,...,x n ) 

< P^U n {n\9 - \ > M n I X n )l FnnGn (X n ) + o(l) 

S n (9o) JHJe^fJl P8 ,r, P0 O flo 
+0(1). 

On the events -F n we have 

n 

Pe,r, , v s -TT P6» ,») 



< 



/ fT^L^) / ex P (nP n log^)dnedn H 

n 

TT^-TO^ sup sup exp(nP n log^) 



< S n (9 )exp( sup sup raP n log ''''' 



which ultimately proves marginal consistency at rate n _1 . □ 
In the proof of Lemma 13.31 the lower bound for the denominator of the marginal posterior 
comes from the following lemma. (Let LT n denote the prior lie hi the local parametrization 
in terms of h = n{6 — Oq).) 
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Lemma 3.4. Let the sequence of maps 9 i— > s n {0) exhibit the LAE property of flty). Assume 
that the prior TIq is thick at 9$ ( and denoted by H n in the local parametrization in terms of 
h). Then 

Po(J s n (h)dU n (h) < (0)) -+ 0, 
for every sequence (a n ), a n \, 0. 

4 Estimation of support boundary points 

In this section we discuss two examples of support boundary estimation for which the like- 
lihood displays an LAE expansion. In Subsection 14.11 the parameter of interest is a shift 
parameter, while in Subsection 14.21 we consider a semiparametric scaling family. 

4.1 Semiparametric shifts 

The so-called location problem is one of the classical problems in statistical inference: let 
Xi, X2, ... be i.i.d. real- valued random variables, each with marginal Fa : R — >• [0, 1], where 
H G R is the location, i.e. the distribution function is some fixed distribution F shifted 
over fi: F^(x) = F(x — (j). 

Depending on the nature of F, the corresponding location estimation problem can take 
various forms: for instance, in case F possesses a density / : R — > [0, 00) that is symmetric 
around (and satisfies the regularity condition J(f'/f) 2 (x)dF(x) < 00), the location /j is 
estimated at rate n _1//2 (equally well whether we know / or not [36J). If F has a support that 
is contained in a half-line in R {i.e. if there is a domain boundary), the problem of estimating 
the location might become easier. Examples have been given in the introduction where we 
considered support boundaries of varying degrees of steepness and concluded that the steeper 
the boundary, the faster the minimax rate of convergence for it. 

In this subsection we consider a model of densities with a steep type of boundary, a true 
discontinuity at [i: we assume that p(x) = for x < /i and p(p) > while p : R — > [0, 00) 
is continuous at all x > \i. Observed is an i.i.d. sample Xi,^, . . . with marginal Pq. The 
distribution Pq is assumed to have a density of above form, i.e. with unknown location 9 for a 
nuisance density 77 in some space H. Model distributions Pq^ are then described by densities, 

pe^ '■ [9, 00) — > [0, 00) : x \-t r/(x — 9), 

for rj G H and 9 E C R. As for the family H of nuisance densities, our interest does not lie in 
modelling of the tail, we concentrate on specifying the behaviour at the discontinuity. For that 
reason (and in order to connect with Theorem l2.2p . we impose some conditions on the nuisance 
space H: assume that r\ : [0,oo) — > [0, 00) is differentiable and that t(t) = rj' (t) /r)(t) + a is a 
bounded continuous function with a limit at infinity. For given S > 0, let _£? denote the ball 
of radius S in the space (C[0, 00], || • 1 1 00) of continuous functions from the extended half-line 
to R with uniform norm. The following lemma maps Jzf to the space H which we choose to 
model the nuisance. 
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Lemma 4.1. Let a > S be fixed. Define H as the image of Jzf under the map that takes 
£ G Jzf into densities rjz by an Esscher transform of the form, 

e -ax+fg £(t)dt 

fc T e ~ a V Ic < { ) dy 

for x > 0. This map is uniform-to-Hellinger continuous and the space H is a collection of 
probability densities that are (i) monotone decreasing with sub- exponential tails, (ii) continu- 
ously differentiable on [0, oo) and (Hi) log-Lipschitz with constant a + S. 

proof One easily shows that £ i— > exp(— a x-\- £) is uniform-to-uniform continuous and that 
exp(— ail J £) > 0, which implies uniform-to-Hellinger continuity of the Esscher transform. 
For the properties of r\-^ note that £(y) dy < S x < a x, so that x h-> exp(— a x + J* £{t) dt) 
is sub-exponential, which implies that £ \— > rj) gives rise to a probability density. The density 
n is differentiable and monotone decreasing. Furthermore, for all 8, 9q £ Q and all x > 8q, 



^ X 9 a \ <e W ( a (e-9 )+ r e £(t)dt)<e^ e -^ : 



proving the log-Lipschitz property. □ 
Since H consists of functions of bounded variation, Theorem V.2.2 in Ibragimov and 
Has'minskii (1981) [15] confirms that the model exhibits local asymptotic exponentiality in the 
^-direction for every fixed rj. In the notation of Definition 12. 1\ jg (hV = Tj(0), i.e. the size of the 
discontinuity at zero. Since it is not difficult to find a prior on a space of bounded continuous 
functions (see, e.g. Lemma 14.61 below). (Borel) measurability of the Esscher transform as a 
map between Jzf and H enables a push-forward prior on H . 

Theorem 4.1. Let Xi, X2, ■ ■ ■ be an i.i.d. sample from the location model introduced above 
with Pq = Pe 0t ri for some 8q S G, T]q G H. Endow O with a prior that is thick at 8q and J2? 
with a prior Tl^f such that Jzf C supp(IIj^). Then the marginal posterior for 8 satisfies, 



sup 

A 



U(n(8 - 9 ) G A I X x , . . . , X n ) - Expl (A) A 0, (19) 



where A n is exponentially distributed with scale 76» ,r) = To(0)- 

The proof of Theorem 14.11 consists of a verification of the conditions of Corollary 12.11 
The following lemmas make the most elaborate steps explicit. Their proofs can be found in 
Subsection 15.31 

Lemma 4.2. Hellinger covering numbers for H are finite, i.e. for all p > 0, N(p, H, djj) < 00. 

Assuming that the nuisance prior is such that Jzf C supp(n_jf), the following lemma 
establishes that Hh{K(p)) > 0, and that condition (ii) of Corollary 12.11 is satisfied. 

Lemma 4.3. For every M > there exist constants L\,L2 > such that for small enough 
p>0, {ri £ GH:\\£-£o\\oo<p 2 }cK(L lP )cK n (L 2 p,M). 
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By Lemma 14.11 the log-Lipschitz constant mo ,r) of Lemma 13.21 equals a + S for every 
T] £ H, so that the domination condition (Hi) and contiguity requirement (iv) of Corollary 12. II 
are satisfied. The following lemma shows that condition (v) of Corollary 12.11 is also satisfied. 

Lemma 4.4. For all bounded, stochastic sequences (h n ), Hettinger distances between Po n (h n ),r) 
and Pe 0tV are of order n 1/2 uniformly in rj, i.e. sup^ gH n 1/2 H(P e ^ hn ^ v , Pe , v ) = 0(1). 

To verify condition (vi) of Corollary 12.11 we now check condition ()17p of Lemma [ 



Lemma 4.5. Let (M n ), M n — > oo, M n < n for n > 1, M n = o(n) be given. Then there exists 
a constant C > such that the condition of Lemma 13.31 is satisfied. 

proof Note first that for fixed x and rj, the map 9 i— > pg t „(x) is monotone increasing. 
Therefore 

sup -logn^(^) < ilogf[ g^Z^ i { (j^), 

where #* = if > #o + M n /n, or #o ~~ M n /n otherwise. We first note that Xm < 
9o + M n /n with probability tending to one. Indeed, shifting the distribution to = 0, we 
calculate, 



^ ( X (l) - ^) = ( : " / " ^(x)^)" < exp(-n / " 



n 

By Lemma I5TTI the right-hand side of the above display is bounded further as follows, 

exp(- leo , Vo M n + M n J " dar) <exp(-^M n ), 

for large enough n. We continue with 9* = 9q — M n /n. By absolute continuity of rj we have 

rXi-6* 

r,(X i -0*) = r,{X i -9 o )+ / rf{y)dy, 

JXi-e a 

and the conditions on the nuisance rj yield the following bound, 

rXi-e* 



r 1 , (y)dy<(9o-9*)(S-a) V (X i -9 ). 
Therefore 

— log I I — pr— rrl{X (1) >9*} IJ < - log 1 < . 

n z~^V\-X-i—uo) [, ~ n \ n J n 

If C < a — S, the condition of Lemma 13.31 is clearly satisfied. □ 

To demonstrate that priors exist such that Jz? C supp(IT^), an explicit construction based 
on the distribution of Brownian sample paths is provided in the following lemma. 

Lemma 4.6. Let S > be given. Let {Wt : t G [0, 1]} be Brownian motion on [0, 1] and let 
Z be independent and distributed N(0, 1). We define the prior ILjf on as the distribution 
of the process, 

i(t) = s^{z + w m ), 

where ^ : [—00,00] — > [—1, 1] : x 1— > 2arctan(x)/7r. Then J2? C suppflLg?). 
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proof Consider C[0, 1] with the uniform norm and its Borel u-algebra, equipped with the 
law IT of t i — y Z + Wti as a probability space. Since ty is Lipschitz, the map / that takes 
C[0, 1] into C[0, oo], Z + W. \-t Z + WW.) is continuous, norm-preserving, and Borel-to-Borel 
measurable. This enables the view of C[0, oo] with its Borel <r-algebra as a probability space, 
with probability measure H'(B) = Tl(f~ l (B)) . Similarly, the map g that takes C[0, oo] into 
if, Z+WqtA i y Sty \Z+W-%t.\) is continuous and Borel-to-Borel measurable. We view if with 
its Borel u-algebra as a probability space, with probability measure IL^>(C) = n'(g( _1 (C)). 
Let T denote a closed set in if such that H^{T) = 1. Note that f^ 1 (g^ 1 (T)) is closed 
and n(/ _1 ((7 _1 (T))) = 1, so that supp(n) C / -1 (<7 (T)). Since the support of U^? equals 
the intersection of all such T, supp(n) C Ht f \9 (T)) = f \9 (supp(lL^>))) . Since 
supp(II) = C[0, 1], for every y 6 C[0, 1], f(g(y)) 6 supp(n^f). The continuity does not 
change under jo/, so supp (n j?) includes if. □ 

4.2 Semiparametric scaling 

Another important statistical problem is related to the scale or dispersion of the probability 
distribution: let Xi,X%, . . . be i.i.d. real- valued random variables, each with marginal F\ : 
R — > [0,1], where A S (0, oo) is the scale, i.e. the distribution function F\ is some fixed 
distribution F scaled by A: F\(x) = F(x/X). 

Again, depending on the nature of F, the corresponding scale estimation problem can take 
various forms: for instance, in case F possesses a density / : R — > [0, oo) with support R that is 
absolutely continuous (and satisfies the regularity condition f (1 + x 2 )(f / f) 2 (x) dF(x) < oo), 
the scale A is estimated at rate n -1//2 (equally well whether we know / or not, as conjectured in 
[55] . and studied later in [39] and [32]). If F is supported on [0,oo) (or (— oo,0]), the problem 
can be reparametrized and viewed as a regular location problem. When F has a support 
that is a closed interval with one non-zero endpoint (i.e. only one point of the support varies 
with scale), the problem of estimating the scale might become easier. Probably the best 
known example of this type is estimation of the scale parameter in the family of the uniform 
distributions [0,A], (A > 0). 

In this subsection we consider an extension of this uniform example: we assume that 
p(x) > for x £ [0, A] and otherwise while p : [0, A] — > [0, oo) is continuous at all x £ (0, A). 
Observed is an i. i. d. sample X\ , X2 , . . . with marginal Pq . The distribution Pq is assumed to 
have a density of above form, i.e. with unknown scale 6 for a nuisance density ij in some space 
H. Model distributions Pq^ are then described by densities, 

p e , v : [0,6] -> [0,oo) :x^ ^(|), (20) 

for 77 6 H and 6 G C (0, 00). Fix S > and assume that rj : [0, 1] — > [0, 00) is monotone 
increasing, differentiable and bounded, and that £(t) = rj '(t) / 'rj(t) — S is a bounded continuous 
function. Forgiven S > 0, let if denote the ball of radius S in the normed space (C[0, 1], ||-||oo) 
of continuous functions from the unit interval to K with uniform norm. The following lemma 
maps if to the space H with which we choose to model the nuisance. 
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Lemma 4.7. Define H as the image of Jz? under the map that takes £ E Jz? into densities m 
by an Esscher transform of the form, 

e sx+ /* i(t) dt 

m{X) = fie*+X*®*dy' (21) 

for x 6 [0,1]. This map is uniform-to-Hellinger continuous and the space H is a collection of 
probability densities that are (i) monotone increasing and bounded away from zero and infinity 
and (ii) continuously differ entiable on [0,1]. Moreover, the resulting densities pg„ satisfy the 
log-Lipschitz condition \lt 



proof The uniform-to-Hellinger continuity of the Esscher transform is proven in the previous 
section and £ i-^- rj^ gives rise to a probability density trivially. For the properties of m, note 
that — S x < Jq £(y) dy < S x, so that for x G [0, 1] 



e 2S 



The density r\ is continuously differentiable so to check the log-Lipschitz condition fix a neigh- 
bourhood of 8q, say (9o — e> #o + e) for some < e < 9q/2, let x lie in the interval [0, 9q] and 
note that, 



log — — (x) 

Pe ,v 



< 



+ 



log- 



■nix/o) I0-0 O | 2se \9-e \ ^ 2 + ss 



rj(x/9 ) 

(If pq„(x) is the log-Lipschitz condition is trivially satisfied.) □ 
Theorem V.2.2 in [15] verifies local asymptotic exponentiality in the ^-direction for every 
fixed rj, although in its positive version. This does not pose problems in applying results of 
previous sections: we maintain the sign for h and write A n = — V n , where V n = u(8q — Xt n \). 
In the notation of Definition 12.11 7e 0l?? = rj(l)/9o, i.e. the scale of the limiting exponential 
distribution is the size of the discontinuity at the varying endpoint of the support. Again, we 
use a push-forward prior on H based on a prior for Jzf . 

As already noted, our scaling and location problems are both LAE and the parametriza- 
tions and solutions we formulate are closely related. However, the nuisance parametrizations 
are quite different and the relation between the models is a subtle one. Therefore the location 
theorem of the previous subsection and the scaling theorem that follows are very similar in 
appearance, but form the answers to quite distinct questions. 

Theorem 4.2. Let X±,X2, ... be an i.i.d. sample from the scale model introduced above with 
Pq = Pe ,r] for some 8q G 0, rjo £ H. Endow with a prior that is thick at 9q, and Jz? with 
a prior Tl^f such that Jzf C supp(n_§f). Then the marginal posterior for 9 satisfies, 



sup 

A 



U(n(9-9 )eA\X 1 ,...,X n )-E w +^ „ (A) Ao, (22) 



where V ra is exponentially distributed with scale 7e 0iT)0 = t]q(1)/9q 



The proof of Theorem 14.21 consists of a verification of the conditions of Corollary 12.11 (after 
the aforementioned modification to comply with the positive version of the LAE expansion). 
The following lemmas make the most elaborate steps explicit, as in the proof of Theorem 14.11 
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Lemma 4.8. Hellinger covering numbers for H are finite, i.e. for all p > 0, N(p, H, djj) < oo. 

Furthermore, prior mass in appropriate Kullback-Leibler neighbourhoods is suffcient. 

Lemma 4.9. For every M > there exist constants Li,L 2 > such that for small enough 
p>0,{ Vl £H: \\£ - iolloo < P 2 } C K(L lP ) C K n (L 2 p, M). 

By Lemma 14.71 the model satisfies the Lipschitz condition of Lemma 13.21 with the same 
Lipschitz constant for every n £ H, so that the domination condition (Hi) and contiguity 
requirement (iv) of Corollary 12.11 are satisfied. 

Lemma 4.10. For all bounded, stochastic sequences [h n ), Hellinger distances between Pg n (h n ),r) 
and Pe 0tV are of order n 1/2 uniformly in n, i.e. sup v&H n 1/2 H(P 6n Q ln)jrj , Pe , v ) = 0(1). 

To verify condition (vi) of Corollary 12.11 we now check condition (I17p of Lemma 13.31 



Lemma 4.11. Let (M n ), M n — > oo, M n < n for n > 1, M n = o(n) be given. Then there 
exists a constant C > such that the condition of Lemma \3.3\ is satisfied. 

proof Note first that for fixed x and n, the map 9 i-> Pe >v (%) is monotone decreasing. 
Therefore 

1 ^ TT Pe ' r > (y\ <: 1 l IT viXj/ 0*)/0* 
sup - log I I (Xi) < — log I I , , . . l{x (n) <o*}(X n ), 

where 9* = Xua if Xr n ) < 9q — M n /n, or 9q + M n /n otherwise. We first note that Xr n \ > 
8q — M n /n with probability tending to one. We calculate, 

Po{x (n) <o -*£) = (i- f m ( x ) dx) n < ^{-nf *>(*) dx 

8q n 6q n 

By the monotonicity of rjo , the right-hand side of the above display is bounded further by 
exp(— 7e o r?0 M n ) , for n > 1. We continue with 9* = 9q + M n /n. By absolute continuity of r] 
we have 

TliXi/0*) v(Xi/9 ) 



+ / g'(y)dy, 
Jen 



9* 9 
where g(y) = n(Xi/y)/y. We note that 

9'(V) = \ri\X i /y){-^) + rj{X i /y)(-^) < n{X/y){-^). 

Monotonicity of n yields the following bound, 

/ 9 (y) ay <{9 - 9 ) 

J8 y o 



Therefore 



1, j\ y{X/9*)/9* ^w 1 !^ M « 1 

-^g[[ n(YJff , /ff l { x (n) <e*Mn) < -log 1- 



n *f = \ 7 1 (X i /9 )/9 )y ~ nJ ~ n & V n 6 + M n /n, 

If C < l/(#o + 1)) the condition of Lemma 13.31 is clearly satisfied. □ 

To demonstrate that priors exist such that Jzf C supp(n_5f ), an explicit construction based 
on the distribution of Brownian sample paths is provided in the following simplified version 
of Lemma [ 
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Lemma 4.12. Let S > be given. Let {Wt : t E [0, 1]} 6e Brownian motion on [0, 1] and Ze£ 
Z 6e independent and distributed N(0,1). We define the prior IL^ on as the distribution 
of the process, 

e(t) = s*(z + w t ), 

where $ : [— oo, oo] — > [—1, 1] : x i— >• 2 arctan(x)/7r. T/ien JSf C supp(n _jf) . 



5 Proofs 

In this section, several longer proofs of theorems and lemmas in the main text have been 
collected. 



5.1 Proofs of Theorem 13.11 and Lemma 13.11 
proof (of Theorem 13,11) 

Let (h n ) be bounded by M. Let C > be given. Given the Hellinger metric entropy and 
cone-conditions, Lemma 3.2 in Bickel and Kleijn (2012) [1] guarantees that for any L > 4, 
there exists a test sequence (</>„) such that for all bounded, stochastic (h n ), (fTT|) is satisfied. 
Based on C, K and C choose L > such that L 2 > (1 + K + C)/C V 16. By Lemma the 
events, 

F n = [x n : J^E^kL { X t )dU H ( V ) > e-^Wn,^^^))}, 

satisfy Pq(F£ n {/i„ < A n }) < (C 2 n/4) -1 ->■ 0. Using also the first property of the test 
sequence in (fTT]h we see that, 

P^U n (D c (Lp n )\e = e n (h n y,x n ) 

= P^U n (D c (Lp n ) | 9 = 9 n (h n );X n ) l {hn < An} (23) 
< P^U n (D c (L Pn ) | = 9 n (h n );X n ) l Fn (X n ) l {hn < An} (1 - 0„)(XJ + o(l). 

Based on the definition of the events F n , we first term on the right is bounded further, 

P^Un(D c (L Pn ) I e = e n {K);X n ) l Fn (X n ) l {hn < An} (1 - cp n )(X n ) 

(1+C)np2 . JL Pf) (h )n (24) 

U H {K n {p n ,M)) J D c(L Pn )f = \ Po 

By Fubini's theorem and the second property of the test sequence in (jlip . we obtain, 

„ n 

pn / Yl P J^L( Xi ) (1 - <t> n ){X n ) dH H ( V ) 

JDc{Lp n ) fJl PO 

„ n 

< pn / [TM^^) (l _ n )(X n ) dn H (r/) (25) 

Jd°(l p „) f = \ Po 

£> C (I-Pn) 
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Combining ([23]) with AMD and (j2"5]L we find that, 

(1+C)np2 

iyn„(^(L,„) 1 9 = < nH(KvXpn}M) f - CL = "ID. 

by the choice we made for L above. □ 
proof (of Lemma 13. 1 j) 

Let C > 0, p > 0, and n > 1 be given. If n#(ET n (p, M)) = 0, the assertion holds trivially, so 
we assume ILjy (if n (/o, M)) > without loss of generality and consider the conditional prior 
Un(A) = U H (A\K n (p,M)) (for measurable A C H). Since, 

» n „ n 

/ n ^h^) ^ n n(^(p, m)) / n n(M ' ?? (^)rfn w (r ? ), 

we may choose to consider only the neighbourhoods K n . Restricting attention to the event 
{h n < A n }, we obtain, 

log / fl^^ L (X l )dU n ( V ) > [ nr n logl A9nihn) ^^ dOn(fi) 

> I inf nF n l Ae log™^dn n (ri)> [ nF inf 1 A A og ™^ dn n ( V ) 

J \h\<M n[ ' po J \h\<M n( > Po 

>Vn -G n ( sup -lA en(h) log Pdn{h) ' v ) dU n {r]) - np 2 , 
J \\h\<M Po J 

using the definition of K n in the last step (see ([6])). Then, 

< P " ( f-G n ( sup -1,4 log ) dU n ( v ) < -V^Cp 2 ) . 

\J \\h\<M P0 J J 

By Chebyshev's inequality, Jensen's inequality, Fubini's theorem and the fact that for any 
Po-square-integrable random variables Z n , P^(G n Z n ) 2 < PqZ 2 , 

Po ( I ~ G J sup -1^ log P -^m\ du^rj) < -^ICp 2 ) 

\J \\h\<M P0 J J 

nC 2 p 4 J V |h|<M Po / nCV 

where the last step follows again from definition Q. □ 

5.2 Proof of Theorems and EES, and Lemmas EH and 



proof (of Theorem I3.2[) 

Let (/i n ) be bounded in i-fa-probability. Throughout this proof we write 9 n (h n ) = 9q + n~~ 1 h n . 
Let 5, e > be given. There exists a constant M > such that P^(|/i n | > M) < 5/2 for all 
n > 1. By the consistency assumption, for large enough n, 

5 



P n (lo g n n (£>(p n ) | e = fl„;Jf 1 ,...,A n ) >-e) >1 . 
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This implies that the posterior's numerator and denominator are related through, 

(„ n 
I rj Po 
^ H i=i 



. n \ 
(X t )<m H { V )<en { \ hA<M} / \l P J^kL { x,{)dIi H { V )) >1-S, 



for this M and all n large enough. We continue with the integral over D(p n ) under the 
restriction \h n \ < M. By stochastic local asymptotic exponentiality for every fixed rj, we 
have, 

n n 

t\ Po t\ Po 

where the rest-term R n {h n ,ri]X_ n ) converges to zero in Pg ^-probability. Define for all e > 
the events, 

F n (v, e ) = \x n : sup \hje ,r, ~ hje ,r, \ - 4' 

\h\<M ' 

and note that F£(Q, e) = 0. With the domination condition (in) of Theorem 12.21 Fatou's 
lemma yields: 

limsup / Pl (hn) (FZ(r),e))dIL H (v) 

rwoo JD(p n ) 

< / limsup l D(pnM0} Pl {hn) (FZfae)) dU H ( V ) = 0. 
Combined with Fubini's theorem, this suffices to conclude that 

„ n „ n 

/ H P ^^(x i )du H ( v )= / n^^TOiF„( W }U»)dnff(»7)+^(i). ( 2 6) 

JD(p n )f = \ PO ^(Pn)f = l PO 

and we continue with the first term on the r.h.s.. For every rj E H, define the events, 

Gnfae) ={x n : sup \R n (h, m X n )\ < eM, 

L \h\<M > 

and note that Pg Q (G^(j] , e)) — > 0. By the contiguity condition (iv) of Theorem 12.21 the 
probabilities P^n^s ^{G^ij], e )) converge to zero as well. Reasoning as with the events F n (j], e), 
we conclude that, 



f ^PeMl {Xi )i F , nM (X n )dU H ( V ) 
Jd{ P u) f = i Po 

„ n 

= / n^^(^) 1 ^(^)nF„(^)(^n)dnH(r / )+o Po (l). 
Jd(p„) „•_-, Po 



>D(p n ) 

For fixed n and rj and for all X_ n G G n (rj,e) H F n (rj,e), and by stochastic local asymptotic 
exponentiality, 

n 

Pe n (h n ), V/Y ^ ] n „TJ Pe o^, 



log n ^^(X,) - log J] - h nl e 0tVl 

< \Rn(h n ,V,2Ln)\ + \ h n(le ,V0 ~ T6» ,r?)| < 2e, 
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from which it follows that, 



„ n 

e*v(KlOom ~ 2e ) / IT -TT 1 (^) 1 G n (r ? ,6)nF„( 7?> 6)(^n) dT\- H (r)) 
JD( Pn )fJi Po 

< / II T^W^mMW <*Ml) 

J D(p n ) i=1 PO 

„ n 

< exp(/i„7e 0iJ?0 + 2e) / TT -^(Xj)l Gnfee)nFn( ^ e) (X n ) dlltf fa)- 

The integrals can be relieved of indicators for G n n -F n by reversing preceding arguments (with 
$o replacing 8 n ), at the expense of an exp(op (l))-factor, leading to, 

„ n 

eMhnle , m - 3e + o Po (l)) / TT ^(X { ) dH H (v) 

J H f = \ Po 

n. n 



< exp(/ ln7 , 0/ , + 3e + o Po (l)) / TT ^l(Xi) dU H ( V ). 

J Ht=l P° 



for all h n < A n . Since this holds for arbitrarily small e > 0, it proves desired result. □ 
proof (of Theorem I3.3() 

Let C be an arbitrary compact subset of R containing an open neighbourhood of the origin. 
Denote the (randomly located) distribution Exp^ ^ by H n . The prior and marginal 
posterior for the local parameter h are denoted II„ and n„( • \X n ). Conditioned on C C R, 
these measures are denoted H^IL^' and n^( • \X_ n ) respectively. Define the functions ^,^ n : 
R -)• R as, 

= Tfc^W*-*-*), = l{x<A n} - 

noting that £ n is the Lebesgue density for E n . Also define s*(/i) = s n (/i) on (—00, A n ] and 
4(A) = s n (0) exp(/i7e 0i , ?0 + d n ) elsewhere. Finally, define, for every g,h £ C and large enough 

n, 

h) - v u^y ^) 5^0 J + 1 ^ a -> 1 ^ a ->' 

and 

fXg,h)=(l W> 8 to)*n{9) 



&.(g) 40) ir n (h). 

By (fT2|) we know that d n = log s n (A n ) — log s n (0) — A n ^g 0jVo = op (l). Furthermore, for every 
stochastic sequence (h n ) in C, 

logs* (/in) = log 4(0) + h n -fo 0:Tto + o Po (l), log£*(/i n ) = (h n - A n )~fo , Vo + logje (hVo . 

Since £ n (h) and Cn(^) (4(A) an d s n(h), respectively) coincide on {h < A n }, f n (g,h) < 
f*(g,h). For any two stochastic sequences (h n ),(g n ) in C, n n (g n ) /TT n (h n ) — > 1 as n — > 00 
since tt is continuous and non-zero at 9$. Combination with the above display leads to, 
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Since x h-> (1 — e x ) + is continuous on (—00,00), we conclude that for any stochastic sequence 
{9n,h n ) in C x C, f^(g n i h n ) 0. To render this limit uniform over C x C, continuity is 
enough: (<?, /i) i-> 7r n (g)/7r n (/i) is continuous since the prior is thick. Note that £*(/i)/s* (/i) is 
of the form 7e 0fl0 exp(76i 0ir)0 (A n + R n (h))) for all /i, n > 1, and R n (K) = o Po (l). Tightness 
of A n and R n implies that £n(h)/ s n(h) e (0, 00) , (Pq — a.s.). Continuity of h h-» s n (/i) and 
^ ^ £nC0 then implies continuity of (g,h) (->• (£^)4(#))/(£n(#) s nW)> (^0* ~ a - s -)- Hence 
we conclude that, 



sup f n (g,h) < sup 
(g,h)eCxC (g,h)eCxC 



f*(g,h)-%Q. 



(27) 



Since s n (/i) is supported on (— 00, A n ], since C contains a neighbourhood of the origin and 
since A n is tight and positive, S n (C) > and n n (C|X n ) > 0, (Pq 1 — a.s.). So conditioning on 
C is well-defined (for the relevant cases where h < A n ). Let 5 > be given and define events, 



\x„ : sup / n (5» < S\. 

1 (g,h)eCxC J 



Based on Q n and (|27jh write 



PS sup n£ (fc e A\x n ) - E„ (A) < iff sup (fc e - (A) 

A A 



ln„+o(l). 



Note that both H„ and II^(-|X„) have strictly positive densities on C. Therefore, E„ is 
dominated by II„ (-|X n ) for all n large enough. With that observation, the first term on the 
right-hand side of the above display is calculated to be, 

1 



-iff sup U^(hGA\X r 
2 A 



ic 



pr, 
1 n 



1 



1 



1 



dUC(.\X 



{h<A n } dU n (h\X n )lQ n (X n ) 



c iuJc s n{g)K n (g)i{g<A n }dg 



s n (h)Tr n (h) 

Sn{g)^n(g)Cn(h) 



i{/i<A„}^n^(/i|x n )in n (x, 



l{g<A n }dZ% (g) ) l {h < An} dtt%(h\X n )l Un (X 



J c Sn(h)7r„{h)£ n (g) 
for large enough n. Jensen's inequality leads to 

U c n (hGA\X n )-E c n (A)\ln n (X n ) 

Sn{g)^n(g)Cn(h)- 



C, 



^o n sup 
2 A 



< P" 
- M) 



< P n 



1 



1 {h<A n } 1 {g<A n } d^nia) (^l^n) 1 ^ (2£ n ) 



sup 

(g,h)eCxC 



Sn{h)ir n {h)£ n (g) ) + 

f n (g,h)dE^(g)dU^(h\X n )ln n (K n ) < 



We conclude that for all compact CcM containing a neighbourhood of the origin, iff||n£ 
H„ || — > 0. To finish the argument, let (C m ) be a sequence of closed balls centred at the origin 
with radii M m — >■ 00. For each fixed m > 1 the above display holds with C = C m , so if we 
traverses the sequence (C m ) slowly enough, convergence to zero can still be guaranteed, i.e. 
there exist (M n ), M n — > 00 such that, P re ||II^ n — H^ n || — > 0. Using Lemmas 2.11 and 2.12 in 
[2T] we conclude that (JUJ) holds. □ 
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proof (of lemma 13. 2p 

Assume first that the "g-domination" condition is satisfied. Assertion (i) follows from Jensen's 
inequality. For the second assertion, fix r/ £ D(p) and take a sequence of events (F n ) such 
that Pq o „(-Fn) ~~ * 0- Contiguity now follows from Holder's inequality (with 1/p + 1/q = 1), 

* (/(n^ «>) , < b *-) 1/, (/ ^^.^'"s^.w^o. 

Next, assume that the log-Lipschitz condition is satisfied. Let (h n ) be a stochastic sequence 
bounded by M > 0. By (USD, 

jjPMM^ exp^m,^^)^) < exp(^X> 0)J? (* 

£J P0o,n ^t! n J V n ^ 

for in j4.0 o , which holds with Pe ^-probability one. Therefore, 



Due to the uniformity of the assumed bound on Pg Q>r) exp(Km0 Otri ), this proves (i). For the 
second assertion, fix rj E D(p) for some p > small enough, and take a sequence of events F n 
such that Pe 0tV (F n ) -> 0. Then, 



P£ (M ,,(F n ) < | exp(^f;^o,,(^))lFjA: n )^^ 



i=l 

n 



J i=l 17 

< (P eo ,,exp( (7 Mm e0ir; )) 1/ ' ? P e Jl ofl (F n ) 1 /P -> 0, 

where we have used Holder's inequality (with 1/p + 1/q = 1) and Jensen's inequality. The 
uniform bound on Pe , v Gxp(Kme 0iV ) implies that [P0 (hV exp(qMmg O)V )) 1 ^ q is finite for any 
77 G D(p) and g > 1. □ 
proof (of Lemma I3.4p 

Let M > be given and define the set C = {h : —M < h < 0}. Denote the op (l) rest-term 
in the integral LAE expansion (|12p by /i 1— > R n (h). By continuity of 9 1— > S n (9), the expansion 
holds uniformly over compacta for large enough n and in particular, sup fcgC < |i? n (/i)| converges 
to zero in Po-P r obability. Let (K n ), K n — > 00 be given. The events B n = {sup^ \R n (h)\ < 
K n /2} satisfy P^(i? n ) — > 1. Since He is thick at 9q, there exists a tt > such that 
inf/igc 1 dH n /dh > tt, for large enough n. Therefore, 

" P °"(/Si5y <in " ( ' l) ~ e "") " p °"({iiS^ dft " T ~ le ~ K '} nB ") 

On Z? n , the integral LAE expansion is lower bounded so that, for large enough n, 

PR(U dU n (h) < t^V^ W„) < P^fJ e h ™o™ dh < Tr^e"^ . 

Since f c e^o.io d/j > M e~ M7e o>io and K n — >■ 00, < tt M e~ MlB o*o for large enough n. 

Combination of the above with K n = — loga n proves the desired result. □ 
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5.3 Proofs of Subsection 14.11 



proof (of Lemma I4.2p 

Given < S < a, we define p\ = a—S > 0. Consider the distribution Q with Lebesgue density 
q > given by q(x) = p\e~ p ® x for x > 0. Then the family = {x \-t \frj^/q(x) : £ 6 Jz?} 
forms a subset of the collection of all monotone functions R i— > [0, C], where C is fixed and 
depends on a, and S. Referring to Theorem 2.7.5 in van der Vaart and Wellner (1996) [38J, 
we conclude that the L2(Q)-bracketing entropy iVn (e, J^", Li2(Q)) of is finite for all e > 0. 
Noting that, 



m ° (x)) 2 dQ(x), 




(1h(v,Vo) 2 = d H (vi,r]i ) 2 = 

it follows that N(p,H,d H ) = N(p,J?,L 2 (Q)) < N u (2p,^,L 2 (Q)) < oo. 
proof (of Lemma I4.3P 

Let p, < p < po and £ G ££ such that \l — ^olloo < P 2 be given. Then, 

I-X-0Q 



□ 



log — — (x) 



P0O 



no 



(£-£ )(t)dt < p 2 P (X - 9 ) + 0(p 4 



(28) 



for all x > 0q. Define, for all a > S and I £ Jz? , the logarithm z of the normalising factor in 
(|18p . Then the relevant log-density-ratio can be written as, 



log — — (a:) 
Pdo,vo 



x-y 



(£ - l )(t) dt - z(a, 1) + z(a, £ ), 



where only the first term is x-dependent. Assume that £ 6 Jzf is such that \l — ^o||oo < P 2 ■ 

9 ), so that z(a-p 2 ,£ Q ) < z(a,£) < z(a + p 2 ,l ). Noting 
that d k z/da k (a, £q) = (— l) k Po(X — 6o) k < oo and using the first-order Taylor expansion of z 
in a, we find, z(a ± p 2 , £ ) = z(a, £ ) =f p 2 P (X - O ) + 0(p 4 ), and ([28]) follows. 
Next note that, for every k > 1, 



Pn 



^-^o . \ k 

(£-£ Q )(t)dt 



<P 



2k 



dy) dP o = p 2k Po(X-0 o ) k , (29) 



Using ()28p we bound the differences between KL divergences and integrals of scores as follows: 



log 



Pdo,Va 



(£-£ )(t)dt 



(x) 



o 



<p 2 (P (*-#o) + (V)), 

<p 2 (Po(X-# ) + 0(p 2 )) 
(i - £ )(t) dt + p 2 {P (X - ) + 0(p 2 )) 



x-6a N ■> 

(£-£ )(t)dt 



and, combining with the bounds (|29|) . we see that, 

-P log ^ < 2p 2 (P (X - 9 ) + 0{p 2 )) , 
P0o,Vo 



Po log 



PSo,Vo 



< p 4 {P (X - 9 ) 2 + 3P (X - O ) + 0(p 2 )), 
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which proves the first inclusion. Let M > 0. Note that A$ = [9, oo), and that 



sup -1 A (h) lo 

\h\<M 



Pe n {h), n 



Pdo,rio \h\<M 



sup lA Mh) Io. 



P0o,VO 



P0 n (h),r) \h\<M 



sup lA 9n(h) lo. 



P6o,ijo P6o,v 



sup 1a. , M log — + log — ^ < + log 



Pe ,n Pe n (h), v 
Peom 



\h\<M Pe„(h),ri 



P9o,V 



n 



Pe ,v 



so that, 



P ( sup -1 A lo, 

V |h|<M 

P ( sup -1 A ( log 



Pe n {h), n 

PSo,Vo 

Pe n (h), v 

PSo,Vo 



< -Pnlof 



pe 0)?7 (a + S)M 



Pdo,Vo 



n 



< Pn lo, 



(' 



pe ,„ \2 2(a + S)M 



+ 



n 



Polio 



P9o,ri 

Pe 



no 



V2 (q + S) 2 M 2 



+ 



implying the existence of a constant Li- 
proof (of Lemma 14. 4|) 

Fix n and w; write /i n for h n (oj). First we consider the case that h n > 0, for a; > #o, 
(r ? 1 /2 (a ._ 0n(/ln)) _ 7? i/2 (a; _^ ) )2 

= 7](x - 0o)l [dofin(hn)] {x) + (r] l/2 (x - 6 n (h n )) - 7] 1/2 (x - e )) 2 l [9n(hn)tOo) (x) 



□ 



To upper bound the second term, we use the absolute continuity of f? 1 / 2 , 



\ v 1 / 2 (x-9 )- V 1 / 2 (x-9 n (h n ))\ 
and then by Jensen's inequality, 



X-0Q I 



U 77. 



1/2 



(y)dy 



< 



,1/2 



(2 + a; - n (h n )) 



dz, 



(V 1/2 (x-e )- v ^ 2 (x-9 n (h n ))) 2 <^ 



{rff 
11 »7 



(z + x - 9 n (h n )) dz. 



Similarly for h n < and x > 9 n (h n ), 

(^/ 2 (x-9 Q )-r ] l / 2 {x-9 n {h n ))) 2 

< r/(x - 9 n {h n ))l [6n Q ln)M {x) - r/(x - 9 n (-M))l [dn{ _ M)M (x) 



M 



+ ri(x-9 n (-M))l [dni _ M)M + — 



(rf) 



f\2 



■{z + x - 9 ) dz l[0 O)DO )(x). 



4n Jo ^ 

Combining these results, we obtain a bound for the squared Hellinger distance: 



r0n(M) r0 

H 2 ( p e n (h n ), v , Pe , v ) < Vip- 9 ) dx + / 

Je Je n (-M) 



t](x - 9 n (-M))dx 



+ 1 {h n <0} 
+ 1 {h„>0} 



n \h-n ) 



rj(x - 9 n (h n )) dx - l{h n <o} / r/(x - 9 n (-M)) dx 



0o 



M 



(rf) 



+ 1 {^<0} / TZ 



00 M 

n(h n ) 4n Ju V 
/\2 



i\2 



(30) 



■{z + x — 9 n (h n )) dz dx 



9i, 



M 



M f~ (rf) 



4n Jq V 



■(z + x — 9q) dz dx. 
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As for the first two terms on the right-hand side of (|30p . we note the following inequality: 



e„(M) 



00 



rj(x — Oq) dx + 



M M 2 f 00 , 
V (x - 9 n {—M)) dx < 2 ld(hV — + — / IV y dy, 



by Lemma [5.11 Furthermore, by shifting appropriately, we find that the third and fourth term 
of ([30]) satisfy the bound, 

1 {h n <o}{ rj{x - n (h n )) dx - I r](x — 6 n {—M)) dx 

K Je n (h n ) Je n (-M) 

h„ M_ M_ 

= 1 {h n <o}(J V{y)dy-J rj(y)dyj =-l {hn<0} J ^rj(y)dy <0, 

(where it is noted that the h n dependent integral in the above display is well defined for 
any h n ). Finally, the fifth and sixth term of (|30p are bounded by the Fisher information for 
location associated with r/: 



J\2 



00 f~ (v'j 

/ (z + x)dzdx 

Jo V Jo 



1 (x) dx dz < 



00 f~J\2 



■(x) dx; 



rj n J 7] 

Combining, we obtain the following upper bound for the relevant Hellinger distance, 



H2 ( P e n (hn),r,,P0o,ri) < 2 ^o,r;— + 2^- 



rf{x) 



T)(x) 



00 /"n'(x)\ 2 
'/(■'•) / ( -^yj r,(x)dx). 



which proves the lemma upon noting that = v( x )\£( x ) ~ a \ — v( x ){ a ~ 



□ 



5.4 Proofs of Subsection 14.21 
proof (of Lemma I4.8P 

Denote by Q the distribution with density rjo = 77^ . Then the family ^ = {a;i-> yjvi/vo '■ £ £ 
«£f} forms a subset of the collection C{ // ([0, 1]), where M is fixed and depends on S. Referring 
to Corollary 2.7.2 in [38], we conclude that the L2(Q)-bracketing entropy iVn(e, J?, L 2 (Q)) of 
& is finite for all e > 0. Noting that, 

-l 



dii(r],r] ) 2 = d H {vi,Vo) 



-(x) 



x)\ dQ{x) 



io * v no v vo 

it follows that N(p,H,d H ) = N(p,^,L 2 (Q)) < N {] (2p, L 2 (Q)) < oo. 
proof (of Lemma I4.9|) 

Let p > and £ £ J£ such that ||i — £o||oo < P 2 be given. Then, 

,-x/8 



log — (x) 



(£-£o)(t)dt <p z P (X/9 ) + O(p 4 



P8 ,ri JO 

for all x £ [0, 6>o]. Define, for all a G R and ieJif, 

z(a,£) =log F e av+ f° l ® dt dy. 
Jo 

Then the relevant log-density-ratio can be written as, 

ex/0o 



1 ^00, 7) / \ 

log -{X) 

PSo,Vo 



(£-£ )(t)dt-z(S,£)+z(S,£ ), 



□ 



(31) 
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where only the first term is x-dependent. Assume that £ 6 «Sf is such that \£ — 4||oo < /° 2 - 
Then, | - < P 2 y, so that js(5 - /? 2 ,4) < < z(S + p 2 ,£ ). Noting that 

d k z/da k (S, £q) = Po(X/6o) k < oo and using the first-order Taylor expansion of z in a, we 
find, z{S ± p 2 ,£ ) = 2(5,4) ± P 2 M x /0o) + 0(p 4 ), and (JSTJ) follows. 
Next note that, for every k > 1, 



Pn 



x / e o . . N fc 

(£-£ )(t)dt 



<P 



2k 



X/9 ° \ k 01. h 

dy) dP = p 2k P (X/6 ) k , (32) 



/o vo ' 

Using (|3ip we bound the differences between KL divergences and integrals of scores as follows: 



log '- L {x / 

P9o,Vo 
P0q,V 

P&0,V0 



x/e 



(£-£ )(t)dt 



log 



.7;) 



<p 2 (p (x/e ) + o( P 2 )), 
<p 2 (p (x/e ) + o( P 2 )) 



a;/6» N 2 

(£-£ )(t)dt 



and, combining with the bounds ([32]) . we see that, 



-P log 



Pe ,v 



< 



Polio. 



'/II 



'/II 



< 



2 /0 2 (P (X/^) + O(p 2 )), 
p A {Po(X/9 ) 2 + 3P O (X/0 O ) + 0(p 2 )), 



which proves the first inclusion. Let M > 0. Note that Aq = [0, 0], and that for large enough 

n, 

sup -1a 9 „ w log — ^ = sup 1 A log— -^2- = sup U log-5l»- ^ 

H<M Pe ,»70 |/l|<M P0 n (h),v \h\<M Pe ,ri P0 n (h),ri 



-, 1 Pfo,?? 



+ log 



Pe 



'/(I 



P8 ,v 



< 



2 + 85M 



log 



Pe , 



so that, 





sup 


-lA ( 


n(h) 


log 




v |/i|<M 




POo,Vo 




sup 

v |/i|<M 




n(h) 


log Pe n (h) >v 

P9o,rio 






<Po 


(log 


Pe ,v\\ 








POom ' 



P0o 



'/o 



^0 n 



4 + 165 M 



Polio 



, Peo£/_ 



1/2 (2 + 85) 2 M 2 



+ 



□ 



implying the existence of a constant L2. 
proof (of Lemma I4.10p 

Note that the elements of the nuisance space H are uniformly bounded by e 2S . Fix n and oo; 
write h n for h n {ui). First we consider the case that h n > 0, 

2 



)n /2 (K) 



1/2 




"w lw » wl(I)+ W )- 
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Note that the first term is bounded from above by (e 2S /6o)l^ (h g n ^M)]{ x )- To upper bound the 
second term, we use the absolute continuity of n 1 / 2 . Let g(y) = (rj(x/y) /y) 1 ^ 2 , 



n l l 2 {x/6 n {h n )) v V 2 (x/9 



e l rl\h n ) 



1/2 




9n(h n ) 



g'{y)dy\ < 



On(M) 



dy. 



By the definition of the nuisance space, for y 6 [0o,9 n (M)], and x < Oq, 



W(y)\<^j- 2 (s + i), 

^0 



and then, 



V 1/2 (x/e n (h n )) r?l\x/0 Q )^ < M^e^ {s + 1)2 



0n 2 (h n ) 



,1/2 

y o 



n 2 el 



Similarly for h n < 0, 

( n 1/2 (x/e n (h n )) v l / 2 (x/e )^ 



q1/2 
y 

,25 



" eo-M/n^-^MW + -rf(0 -M/n)*\0 -M/n + V 
Combining these results, we obtain a bound for the squared Hellinger distance: 

h 2 (p p x Me2S i Me2s i M2e25 f-m 2 i M " e2Se ° ( se ° nV 



□ 



Lemma 5.1. For every differentiable r\ and e > f/ie following inequalities hold: 

i](0)e — e J \n'{x)\dx< j n{x) dx < r/(0)e + e / e&r. 
JO JO JO 

proof Integration by parts yields 

/ n(x) dx = ij(0)e + / (e — x)rj'(x) dx. 
Jo Jo 

Since —e\rj'(x)\ < (e — x)rj'(x) < e\r]'(x)\ for x € [0, e], the assertion holds. 
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